Overview

Dataset statistics

Number of variables19
Number of observations99958
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory14.5 MiB
Average record size in memory152.0 B

Variable types

Numeric13
Categorical6

Warnings

df_index is highly correlated with primary_keyHigh correlation
primary_key is highly correlated with df_indexHigh correlation
voice_oob_mean is highly correlated with voice_oob_sumHigh correlation
voice_oob_sum is highly correlated with voice_oob_meanHigh correlation
voice_oob_nat_mean is highly correlated with voice_oob_nat_sumHigh correlation
voice_oob_nat_sum is highly correlated with voice_oob_nat_meanHigh correlation
primary_key is uniformly distributed Uniform
df_index has unique values Unique
primary_key has unique values Unique
r_age_val has unique values Unique
count_orange has 83374 (83.4%) zeros Zeros
voice_oob_mean has 65925 (66.0%) zeros Zeros
voice_oob_sum has 65925 (66.0%) zeros Zeros
voice_oob_nat_mean has 80337 (80.4%) zeros Zeros
voice_oob_nat_sum has 80337 (80.4%) zeros Zeros

Reproduction

Analysis started2021-01-16 21:41:37.003491
Analysis finished2021-01-16 21:42:25.890091
Duration48.89 seconds
Software versionpandas-profiling v2.10.0
Download configurationconfig.yaml

Variables

df_index
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct99958
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean50580.26431
Minimum0
Maximum100999
Zeros1
Zeros (%)< 0.1%
Memory size781.0 KiB

Quantile statistics

Minimum0
5-th percentile4998.85
Q124993.25
median50993.5
Q375997.75
95-th percentile96000.15
Maximum100999
Range100999
Interquartile range (IQR)51004.5

Descriptive statistics

Standard deviation29291.26667
Coefficient of variation (CV)0.5791046581
Kurtosis-1.215053109
Mean50580.26431
Median Absolute Deviation (MAD)25502.5
Skewness-0.008149656078
Sum5055902060
Variance857978303.2
MonotocityStrictly increasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20471
 
< 0.1%
846681
 
< 0.1%
47591
 
< 0.1%
272881
 
< 0.1%
252411
 
< 0.1%
313861
 
< 0.1%
293391
 
< 0.1%
191001
 
< 0.1%
170531
 
< 0.1%
231981
 
< 0.1%
Other values (99948)99948
> 99.9%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
ValueCountFrequency (%)
1009991
< 0.1%
1009981
< 0.1%
1009971
< 0.1%
1009961
< 0.1%
1009951
< 0.1%

primary_key
Real number (ℝ≥0)

HIGH CORRELATION
UNIFORM
UNIQUE

Distinct99958
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean49997.85928
Minimum1
Maximum100000
Zeros0
Zeros (%)0.0%
Memory size781.0 KiB

Quantile statistics

Minimum1
5-th percentile4999.85
Q124994.25
median49994.5
Q374998.75
95-th percentile95001.15
Maximum100000
Range99999
Interquartile range (IQR)50004.5

Descriptive statistics

Standard deviation28869.17893
Coefficient of variation (CV)0.5774083
Kurtosis-1.200138451
Mean49997.85928
Median Absolute Deviation (MAD)25002.5
Skewness0.0001647137656
Sum4997686018
Variance833429492.1
MonotocityStrictly increasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20471
 
< 0.1%
396221
 
< 0.1%
764641
 
< 0.1%
744171
 
< 0.1%
805621
 
< 0.1%
785151
 
< 0.1%
682761
 
< 0.1%
662291
 
< 0.1%
723741
 
< 0.1%
703271
 
< 0.1%
Other values (99948)99948
> 99.9%
ValueCountFrequency (%)
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
ValueCountFrequency (%)
1000001
< 0.1%
999991
< 0.1%
999981
< 0.1%
999971
< 0.1%
999961
< 0.1%

r_age_val
Real number (ℝ≥0)

UNIQUE

Distinct99958
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.498642977
Minimum3.23667191 × 105
Maximum0.9999880234
Zeros0
Zeros (%)0.0%
Memory size781.0 KiB

Quantile statistics

Minimum3.23667191 × 105
5-th percentile0.04917355964
Q10.2493645223
median0.497055489
Q30.7497351963
95-th percentile0.9499201727
Maximum0.9999880234
Range0.9999556567
Interquartile range (IQR)0.500370674

Descriptive statistics

Standard deviation0.2889375317
Coefficient of variation (CV)0.5794477113
Kurtosis-1.200737397
Mean0.498642977
Median Absolute Deviation (MAD)0.2501880743
Skewness0.006012226328
Sum49843.35469
Variance0.08348489725
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.18408024241
 
< 0.1%
0.79354010131
 
< 0.1%
0.4587133621
 
< 0.1%
0.81916956671
 
< 0.1%
0.11223934821
 
< 0.1%
0.71747672671
 
< 0.1%
0.11961171121
 
< 0.1%
0.53530338761
 
< 0.1%
0.13662098371
 
< 0.1%
0.23374656841
 
< 0.1%
Other values (99948)99948
> 99.9%
ValueCountFrequency (%)
3.23667191 × 1051
< 0.1%
4.471326247 × 1051
< 0.1%
5.069980398 × 1051
< 0.1%
8.817738853 × 1051
< 0.1%
9.196507744 × 1051
< 0.1%
ValueCountFrequency (%)
0.99998802341
< 0.1%
0.99992970631
< 0.1%
0.99988885291
< 0.1%
0.99988530741
< 0.1%
0.99986794731
< 0.1%

cust_gender_cd
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size781.0 KiB
F
50154 
M
45738 
Unknown
 
4066

Length

Max length7
Median length1
Mean length1.244062506
Min length1

Characters and Unicode

Total characters124354
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowF
2nd rowM
3rd rowF
4th rowM
5th rowF
ValueCountFrequency (%)
F50154
50.2%
M45738
45.8%
Unknown4066
 
4.1%
Histogram of lengths of the category
ValueCountFrequency (%)
f50154
50.2%
m45738
45.8%
unknown4066
 
4.1%

Most occurring characters

ValueCountFrequency (%)
F50154
40.3%
M45738
36.8%
n12198
 
9.8%
U4066
 
3.3%
k4066
 
3.3%
o4066
 
3.3%
w4066
 
3.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter99958
80.4%
Lowercase Letter24396
 
19.6%

Most frequent character per category

ValueCountFrequency (%)
n12198
50.0%
k4066
 
16.7%
o4066
 
16.7%
w4066
 
16.7%
ValueCountFrequency (%)
F50154
50.2%
M45738
45.8%
U4066
 
4.1%

Most occurring scripts

ValueCountFrequency (%)
Latin124354
100.0%

Most frequent character per script

ValueCountFrequency (%)
F50154
40.3%
M45738
36.8%
n12198
 
9.8%
U4066
 
3.3%
k4066
 
3.3%
o4066
 
3.3%
w4066
 
3.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII124354
100.0%

Most frequent character per block

ValueCountFrequency (%)
F50154
40.3%
M45738
36.8%
n12198
 
9.8%
U4066
 
3.3%
k4066
 
3.3%
o4066
 
3.3%
w4066
 
3.3%

cust_language_cd
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size781.0 KiB
FR
53841 
NL
45606 
EN
 
374
DE
 
137

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters199916
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFR
2nd rowFR
3rd rowFR
4th rowFR
5th rowNL
ValueCountFrequency (%)
FR53841
53.9%
NL45606
45.6%
EN374
 
0.4%
DE137
 
0.1%
Histogram of lengths of the category
ValueCountFrequency (%)
fr53841
53.9%
nl45606
45.6%
en374
 
0.4%
de137
 
0.1%

Most occurring characters

ValueCountFrequency (%)
F53841
26.9%
R53841
26.9%
N45980
23.0%
L45606
22.8%
E511
 
0.3%
D137
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter199916
100.0%

Most frequent character per category

ValueCountFrequency (%)
F53841
26.9%
R53841
26.9%
N45980
23.0%
L45606
22.8%
E511
 
0.3%
D137
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin199916
100.0%

Most frequent character per script

ValueCountFrequency (%)
F53841
26.9%
R53841
26.9%
N45980
23.0%
L45606
22.8%
E511
 
0.3%
D137
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII199916
100.0%

Most frequent character per block

ValueCountFrequency (%)
F53841
26.9%
R53841
26.9%
N45980
23.0%
L45606
22.8%
E511
 
0.3%
D137
 
0.1%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size781.0 KiB
MASS
92900 
SOHO
 
7058

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters399832
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMASS
2nd rowMASS
3rd rowMASS
4th rowMASS
5th rowMASS
ValueCountFrequency (%)
MASS92900
92.9%
SOHO7058
 
7.1%
Histogram of lengths of the category
ValueCountFrequency (%)
mass92900
92.9%
soho7058
 
7.1%

Most occurring characters

ValueCountFrequency (%)
S192858
48.2%
M92900
23.2%
A92900
23.2%
O14116
 
3.5%
H7058
 
1.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter399832
100.0%

Most frequent character per category

ValueCountFrequency (%)
S192858
48.2%
M92900
23.2%
A92900
23.2%
O14116
 
3.5%
H7058
 
1.8%

Most occurring scripts

ValueCountFrequency (%)
Latin399832
100.0%

Most frequent character per script

ValueCountFrequency (%)
S192858
48.2%
M92900
23.2%
A92900
23.2%
O14116
 
3.5%
H7058
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII399832
100.0%

Most frequent character per block

ValueCountFrequency (%)
S192858
48.2%
M92900
23.2%
A92900
23.2%
O14116
 
3.5%
H7058
 
1.8%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size781.0 KiB
0
62964 
1
36994 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters99958
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row1
5th row0
ValueCountFrequency (%)
062964
63.0%
136994
37.0%
Histogram of lengths of the category
ValueCountFrequency (%)
062964
63.0%
136994
37.0%

Most occurring characters

ValueCountFrequency (%)
062964
63.0%
136994
37.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number99958
100.0%

Most frequent character per category

ValueCountFrequency (%)
062964
63.0%
136994
37.0%

Most occurring scripts

ValueCountFrequency (%)
Common99958
100.0%

Most frequent character per script

ValueCountFrequency (%)
062964
63.0%
136994
37.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII99958
100.0%

Most frequent character per block

ValueCountFrequency (%)
062964
63.0%
136994
37.0%

count_orange
Real number (ℝ≥0)

ZEROS

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1896796655
Minimum0
Maximum6
Zeros83374
Zeros (%)83.4%
Memory size781.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum6
Range6
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.4549128731
Coefficient of variation (CV)2.398321781
Kurtosis7.808397147
Mean0.1896796655
Median Absolute Deviation (MAD)0
Skewness2.598609855
Sum18960
Variance0.2069457221
MonotocityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
083374
83.4%
114457
 
14.5%
21906
 
1.9%
3200
 
0.2%
415
 
< 0.1%
55
 
< 0.1%
61
 
< 0.1%
ValueCountFrequency (%)
083374
83.4%
114457
 
14.5%
21906
 
1.9%
3200
 
0.2%
415
 
< 0.1%
ValueCountFrequency (%)
61
 
< 0.1%
55
 
< 0.1%
415
 
< 0.1%
3200
 
0.2%
21906
1.9%
Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size781.0 KiB
1
63875 
2
22980 
3
9035 
4
 
4067
6
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters99958
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st row1
2nd row1
3rd row1
4th row2
5th row1
ValueCountFrequency (%)
163875
63.9%
222980
 
23.0%
39035
 
9.0%
44067
 
4.1%
61
 
< 0.1%
Histogram of lengths of the category
ValueCountFrequency (%)
163875
63.9%
222980
 
23.0%
39035
 
9.0%
44067
 
4.1%
61
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
163875
63.9%
222980
 
23.0%
39035
 
9.0%
44067
 
4.1%
61
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number99958
100.0%

Most frequent character per category

ValueCountFrequency (%)
163875
63.9%
222980
 
23.0%
39035
 
9.0%
44067
 
4.1%
61
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Common99958
100.0%

Most frequent character per script

ValueCountFrequency (%)
163875
63.9%
222980
 
23.0%
39035
 
9.0%
44067
 
4.1%
61
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII99958
100.0%

Most frequent character per block

ValueCountFrequency (%)
163875
63.9%
222980
 
23.0%
39035
 
9.0%
44067
 
4.1%
61
 
< 0.1%

voice_oob_mean
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct23763
Distinct (%)23.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.764173275
Minimum0
Maximum375.3951333
Zeros65925
Zeros (%)66.0%
Memory size781.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30.775958335
95-th percentile9.468136669
Maximum375.3951333
Range375.3951333
Interquartile range (IQR)0.775958335

Descriptive statistics

Standard deviation6.325627466
Coefficient of variation (CV)3.585604406
Kurtosis442.1400149
Mean1.764173275
Median Absolute Deviation (MAD)0
Skewness14.10221597
Sum176343.2322
Variance40.01356284
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
065925
66.0%
0.27551395
 
1.4%
0.17906667767
 
0.8%
0.551433
 
0.4%
0.55096667274
 
0.3%
0.8265205
 
0.2%
0.41323333199
 
0.2%
0.35813333167
 
0.2%
1.10285
 
0.1%
0.4545666780
 
0.1%
Other values (23753)30428
30.4%
ValueCountFrequency (%)
065925
66.0%
0.000933331
 
< 0.1%
0.001133334
 
< 0.1%
0.001833331
 
< 0.1%
0.00235
 
< 0.1%
ValueCountFrequency (%)
375.39513331
< 0.1%
353.31986671
< 0.1%
275.47406671
< 0.1%
243.37253331
< 0.1%
237.34671
< 0.1%

voice_oob_sum
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct23674
Distinct (%)23.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.265834626
Minimum0
Maximum1126.1854
Zeros65925
Zeros (%)66.0%
Memory size781.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q32.29465
95-th percentile28.35906
Maximum1126.1854
Range1126.1854
Interquartile range (IQR)2.29465

Descriptive statistics

Standard deviation18.83123087
Coefficient of variation (CV)3.576115129
Kurtosis439.4605764
Mean5.265834626
Median Absolute Deviation (MAD)0
Skewness13.99401962
Sum526362.2975
Variance354.6152561
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
065925
66.0%
0.82651429
 
1.4%
0.5372779
 
0.8%
1.653450
 
0.5%
1.6529281
 
0.3%
1.2397202
 
0.2%
2.4795190
 
0.2%
1.0744170
 
0.2%
3.30687
 
0.1%
1.363780
 
0.1%
Other values (23664)30365
30.4%
ValueCountFrequency (%)
065925
66.0%
0.00281
 
< 0.1%
0.00344
 
< 0.1%
0.00551
 
< 0.1%
0.00695
 
< 0.1%
ValueCountFrequency (%)
1126.18541
< 0.1%
1059.95961
< 0.1%
826.42221
< 0.1%
730.11761
< 0.1%
712.01961
< 0.1%

voice_oob_nat_mean
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct17160
Distinct (%)17.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.087095392
Minimum0
Maximum275.4740667
Zeros80337
Zeros (%)80.4%
Memory size781.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile6.494221669
Maximum275.4740667
Range275.4740667
Interquartile range (IQR)0

Descriptive statistics

Standard deviation4.475306891
Coefficient of variation (CV)4.116756382
Kurtosis377.5297469
Mean1.087095392
Median Absolute Deviation (MAD)0
Skewness12.70083205
Sum108663.8811
Variance20.02837177
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
080337
80.4%
0.0688666713
 
< 0.1%
0.055111
 
< 0.1%
0.0447666710
 
< 0.1%
0.10569
 
< 0.1%
0.128566679
 
< 0.1%
0.123966678
 
< 0.1%
0.174466678
 
< 0.1%
0.137733338
 
< 0.1%
0.075766678
 
< 0.1%
Other values (17150)19537
 
19.5%
ValueCountFrequency (%)
080337
80.4%
0.000933331
 
< 0.1%
0.001133334
 
< 0.1%
0.001833331
 
< 0.1%
0.00236
 
< 0.1%
ValueCountFrequency (%)
275.47406671
< 0.1%
243.37253331
< 0.1%
169.751
< 0.1%
163.53841
< 0.1%
136.40826671
< 0.1%

voice_oob_nat_sum
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct17138
Distinct (%)17.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.254846711
Minimum0
Maximum826.4222
Zeros80337
Zeros (%)80.4%
Memory size781.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile19.43995
Maximum826.4222
Range826.4222
Interquartile range (IQR)0

Descriptive statistics

Standard deviation13.39464728
Coefficient of variation (CV)4.115292814
Kurtosis380.0509556
Mean3.254846711
Median Absolute Deviation (MAD)0
Skewness12.72397772
Sum325347.9675
Variance179.4165757
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
080337
80.4%
0.206613
 
< 0.1%
0.165311
 
< 0.1%
0.134310
 
< 0.1%
0.38579
 
< 0.1%
0.31689
 
< 0.1%
0.41328
 
< 0.1%
0.44088
 
< 0.1%
0.22738
 
< 0.1%
0.37198
 
< 0.1%
Other values (17128)19537
 
19.5%
ValueCountFrequency (%)
080337
80.4%
0.00281
 
< 0.1%
0.00344
 
< 0.1%
0.00551
 
< 0.1%
0.00696
 
< 0.1%
ValueCountFrequency (%)
826.42221
< 0.1%
730.11761
< 0.1%
509.251
< 0.1%
490.61521
< 0.1%
409.22481
< 0.1%

mean_bill_rev_vs_trf_plan
Real number (ℝ≥0)

Distinct70560
Distinct (%)70.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.102395808
Minimum0
Maximum39.50738889
Zeros349
Zeros (%)0.3%
Memory size781.0 KiB

Quantile statistics

Minimum0
5-th percentile0.7438
Q10.8319972225
median0.910143611
Q31.150698167
95-th percentile2.015644283
Maximum39.50738889
Range39.50738889
Interquartile range (IQR)0.3187009444

Descriptive statistics

Standard deviation0.6514355205
Coefficient of variation (CV)0.5909270658
Kurtosis319.3943029
Mean1.102395808
Median Absolute Deviation (MAD)0.083698611
Skewness10.76682523
Sum110193.2802
Variance0.4243682374
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.82644666673531
 
3.5%
0.82644166672409
 
2.4%
0.826452295
 
2.3%
0.8264451592
 
1.6%
0.99174984
 
1.0%
0.826444838
 
0.8%
0.72727456
 
0.5%
0.8308911113377
 
0.4%
0.8264457143366
 
0.4%
0349
 
0.3%
Other values (70550)86761
86.8%
ValueCountFrequency (%)
0349
0.3%
0.00025277751
 
< 0.1%
0.0024751
 
< 0.1%
0.0025333333331
 
< 0.1%
0.0055555558331
 
< 0.1%
ValueCountFrequency (%)
39.507388891
< 0.1%
33.103097781
< 0.1%
29.735753331
< 0.1%
27.034345331
< 0.1%
19.1767251
< 0.1%

tenure_days
Real number (ℝ≥0)

Distinct7085
Distinct (%)7.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2619.986714
Minimum88
Maximum7461
Zeros0
Zeros (%)0.0%
Memory size781.0 KiB

Quantile statistics

Minimum88
5-th percentile207
Q1772
median2054
Q34412
95-th percentile6233.15
Maximum7461
Range7373
Interquartile range (IQR)3640

Descriptive statistics

Standard deviation2050.001721
Coefficient of variation (CV)0.7824473725
Kurtosis-1.073057737
Mean2619.986714
Median Absolute Deviation (MAD)1535
Skewness0.5220895246
Sum261888632
Variance4202507.055
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
202147
 
0.1%
422145
 
0.1%
406135
 
0.1%
415112
 
0.1%
181102
 
0.1%
40195
 
0.1%
41194
 
0.1%
40894
 
0.1%
10793
 
0.1%
41092
 
0.1%
Other values (7075)98849
98.9%
ValueCountFrequency (%)
886
< 0.1%
903
 
< 0.1%
918
< 0.1%
933
 
< 0.1%
958
< 0.1%
ValueCountFrequency (%)
74611
< 0.1%
74601
< 0.1%
74591
< 0.1%
74561
< 0.1%
74551
< 0.1%

days_since_moving
Real number (ℝ≥0)

Distinct358
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean358.6777246
Minimum4
Maximum365
Zeros0
Zeros (%)0.0%
Memory size781.0 KiB

Quantile statistics

Minimum4
5-th percentile365
Q1365
median365
Q3365
95-th percentile365
Maximum365
Range361
Interquartile range (IQR)0

Descriptive statistics

Standard deviation37.25369475
Coefficient of variation (CV)0.1038639765
Kurtosis45.43473743
Mean358.6777246
Median Absolute Deviation (MAD)0
Skewness-6.604567379
Sum35852708
Variance1387.837772
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
36596180
96.2%
29444
 
< 0.1%
18126
 
< 0.1%
23724
 
< 0.1%
30124
 
< 0.1%
28624
 
< 0.1%
32123
 
< 0.1%
20923
 
< 0.1%
20722
 
< 0.1%
31222
 
< 0.1%
Other values (348)3546
 
3.5%
ValueCountFrequency (%)
46
< 0.1%
512
< 0.1%
68
< 0.1%
75
 
< 0.1%
814
< 0.1%
ValueCountFrequency (%)
36596180
96.2%
36418
 
< 0.1%
36316
 
< 0.1%
3625
 
< 0.1%
3615
 
< 0.1%

nb_cont
Real number (ℝ≥0)

Distinct479
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean45.58085396
Minimum0
Maximum1847
Zeros182
Zeros (%)0.2%
Memory size781.0 KiB

Quantile statistics

Minimum0
5-th percentile8
Q121
median35
Q357
95-th percentile115
Maximum1847
Range1847
Interquartile range (IQR)36

Descriptive statistics

Standard deviation42.74962571
Coefficient of variation (CV)0.9378855813
Kurtosis85.21080245
Mean45.58085396
Median Absolute Deviation (MAD)17
Skewness5.350697573
Sum4556171
Variance1827.530498
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
171848
 
1.8%
251847
 
1.8%
241841
 
1.8%
221831
 
1.8%
271831
 
1.8%
291819
 
1.8%
211804
 
1.8%
281802
 
1.8%
201782
 
1.8%
301772
 
1.8%
Other values (469)81781
81.8%
ValueCountFrequency (%)
0182
 
0.2%
1246
0.2%
2358
0.4%
3449
0.4%
4566
0.6%
ValueCountFrequency (%)
18471
< 0.1%
12161
< 0.1%
11941
< 0.1%
11551
< 0.1%
11131
< 0.1%

avg_tp_churn
Real number (ℝ≥0)

Distinct41
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.02891477422
Minimum0.004684040198
Maximum0.05323896678
Zeros0
Zeros (%)0.0%
Memory size781.0 KiB

Quantile statistics

Minimum0.004684040198
5-th percentile0.008277404922
Q10.02475870751
median0.03072007864
Q30.03514511531
95-th percentile0.03903508772
Maximum0.05323896678
Range0.04855492658
Interquartile range (IQR)0.0103864078

Descriptive statistics

Standard deviation0.009096067923
Coefficient of variation (CV)0.3145820145
Kurtosis0.8724151247
Mean0.02891477422
Median Absolute Deviation (MAD)0.005507416861
Skewness-0.5189241464
Sum2890.263001
Variance8.273845167 × 105
MonotocityNot monotonic
Histogram with fixed size bins (bins=41)
ValueCountFrequency (%)
0.0365955974112907
 
12.9%
0.0322629379112673
 
12.7%
0.024089603099470
 
9.5%
0.025377723797220
 
7.2%
0.025212661785704
 
5.7%
0.028689474624916
 
4.9%
0.035145115314185
 
4.2%
0.036341492163537
 
3.5%
0.012844221113530
 
3.5%
0.029770136423279
 
3.3%
Other values (31)32537
32.6%
ValueCountFrequency (%)
0.0046840401982194
2.2%
0.007636178517779
 
0.8%
0.0082774049222706
2.7%
0.012844221113530
3.5%
0.01437371663455
 
0.5%
ValueCountFrequency (%)
0.053238966781479
1.5%
0.046615484972658
2.7%
0.04658106898237
 
0.2%
0.04119318182188
 
0.2%
0.03903508772653
 
0.7%

churned
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size781.0 KiB
0
97756 
1
 
2202

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters99958
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0
ValueCountFrequency (%)
097756
97.8%
12202
 
2.2%
Histogram of lengths of the category
ValueCountFrequency (%)
097756
97.8%
12202
 
2.2%

Most occurring characters

ValueCountFrequency (%)
097756
97.8%
12202
 
2.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number99958
100.0%

Most frequent character per category

ValueCountFrequency (%)
097756
97.8%
12202
 
2.2%

Most occurring scripts

ValueCountFrequency (%)
Common99958
100.0%

Most frequent character per script

ValueCountFrequency (%)
097756
97.8%
12202
 
2.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII99958
100.0%

Most frequent character per block

ValueCountFrequency (%)
097756
97.8%
12202
 
2.2%

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexprimary_keyr_age_valcust_gender_cdcust_language_cdcust_mkt_segm_desctrf_mdl_phonedeal_cdcount_orangecust_total_mobile_qtyvoice_oob_meanvoice_oob_sumvoice_oob_nat_meanvoice_oob_nat_summean_bill_rev_vs_trf_plantenure_daysdays_since_movingnb_contavg_tp_churnchurned
0010.369409FFRMASS0010.0000000.00000.0000000.00000.826445228365240.0363410
1120.160948MFRMASS00116.33833349.015016.33833349.01500.0000002715365190.0251390
2230.495127FFRMASS0010.0000000.00000.0000000.00000.8353366211365160.0365960
3340.227484MFRMASS1020.0000000.00000.0000000.00000.827435152365150.0240900
4450.213877FNLMASS0011.3495674.04871.3495674.04870.9868983896365150.0354600
5560.467429FNLMASS1010.3443671.03310.0000000.00000.860240148614530.0240900
6670.511718MNLMASS0010.0000000.00000.0000000.00001.0326404762365450.0363410
7780.809214FFRMASS0011.0963003.28891.0963003.28891.1772496143651100.0337410
8890.165527MNLMASS0010.0000000.00000.0000000.00001.377407363136570.0297700
99100.766809MFRMASS0010.0000000.00000.0000000.00000.706924246365210.0532390

Last rows

df_indexprimary_keyr_age_valcust_gender_cdcust_language_cdcust_mkt_segm_desctrf_mdl_phonedeal_cdcount_orangecust_total_mobile_qtyvoice_oob_meanvoice_oob_sumvoice_oob_nat_meanvoice_oob_nat_summean_bill_rev_vs_trf_plantenure_daysdays_since_movingnb_contavg_tp_churnchurned
99948100990999910.843009FNLMASS0020.0000000.00000.0000000.00001.0478732009365600.0324720
99949100991999920.441968MFRSOHO1020.0000000.00000.0000000.00001.287842786365340.0143740
99950100992999930.117985UnknownFRSOHO0010.1952330.58570.0000000.00001.2524942543365400.0322630
99951100993999940.936916UnknownFRMASS0010.0000000.00000.0000000.00000.8593133839126470.0363410
99952100994999950.532780FFRMASS1030.0000000.00000.0000000.00000.9013924455365210.0240900
99953100995999960.581166FFRMASS1020.0000000.00000.0000000.00000.835612128365890.0351450
99954100996999970.900034MFRMASS0025.80970017.42915.80970017.42910.9998197147365360.0380400
99955100997999980.741915FFRMASS0020.0000000.00000.0000000.00000.8353331472365130.0297700
99956100998999990.067159FNLMASS0110.0000000.00000.0000000.00000.9004046045365510.0324720
999571009991000000.786798MFRMASS0011.1789673.53691.1238673.37160.9082691519365520.0466150